Automatic Identification of Infrequent Word Senses
نویسندگان
چکیده
In this paper we show that an unsupervised method for ranking word senses automatically can be used to identify infrequently occurring senses. We demonstrate this using a ranking of noun senses derived from the BNC and evaluating on the sense-tagged text available in both SemCor and the SENSEVAL-2 English all-words task. We show that the method does well at identifying senses that do not occur in a corpus, and that those that are erroneously filtered but do occur typically have a lower frequency than the other senses. This method should be useful for word sense disambiguation systems, allowing effort to be concentrated on more frequent senses; it may also be useful for other tasks such as lexical acquisition. Whilst the results on balanced corpora are promising, our chief motivation for the method is for application to domain specific text. For text within a particular domain many senses from a generic inventory will be rare, and possibly redundant. Since a large domain specific corpus of sense annotated data is not available, we evaluate our method on domain-specific corpora and demonstrate that sense types identified for removal are predominantly senses from outside the domain.
منابع مشابه
Automatic identification of words with novel but infrequent senses
We propose a statistical method for identifying words that have a novel sense in one corpus compared to another based on differences in their lexico-syntactic contexts in those corpora. In contrast to previous work on identifying semantic change, we focus specifically on infrequent word senses. Given the challenges of evaluation for this task, we further propose a novel evaluation method based ...
متن کاملNovel Word-sense Identification
Automatic lexical acquisition has been an active area of research in computational linguistics for over two decades, but the automatic identification of new word-senses has received attention only very recently. Previous work on this topic has been limited by the availability of appropriate evaluation resources. In this paper we present the largest corpus-based dataset of diachronic sense diffe...
متن کاملFrom the Culinary to the Political Meaning of "quenelle" : Using Topic Models For Identifying Novel Senses (De la quenelle culinaire à la quenelle politique : identification de changements sémantiques à l'aide des Topic Models) [in French]
In this study we explore topic modeling for the automatic detection of new senses of known words. We apply methods developed in previous work for English (Lau et al., 2012, 2014) on a recent case of new word sense induction in French, namely the appearence of the new meaning of gesture for the word « quenelle ». Our experiments illustrate the potential of this approach at learning word senses, ...
متن کاملSemEval-2 Task 15: Infrequent Sense Identification for Mandarin Text to Speech Systems
There are seven cases of grapheme to phoneme in a text to speech system (Yarowsky, 1997). Among them, the most difficult task is disambiguating the homograph word, which has the same POS but different pronunciation. In this case, different pronunciations of the same word always correspond to different word senses. Once the word senses are disambiguated, the problem of GTP is resolved. There is ...
متن کاملExperiments in Automatic Word Class and Word Sense Identification for Information Retrieval
Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval systems. Large online corpora and increased computational capabilities make new techniques based on corpus linguistics feasible. Corpus...
متن کامل